Chi-squared Amplification: Identifying Hidden Hubs

نویسندگان

  • Ravi Kannan
  • Santosh Vempala
چکیده

We consider the following general hidden hubs model: an n × n random matrix A with a subset S of k special rows (hubs): entries in rows outside S are generated from the (Gaussian) probability distribution p0 ∼ N(0, σ 0); for each row in S, some k of its entries are generated from p1 ∼ N(0, σ 1), σ1 > σ0, and the rest of the entries from p0. The special rows with higher variance entries can be viewed as hidden higher-degree hubs. The problem we address is to identify them efficiently. This model includes and significantly generalizes the planted Gaussian Submatrix Model, where the special entries are all in a k × k submatrix. There are two well-known barriers: if k ≥ c √ n lnn, just the row sums are sufficient to find S in the general model. For the submatrix problem, this can be improved by a √ lnn factor to k ≥ c√n by spectral methods or combinatorial methods. In the variant with p0 = ±1 (with probability 1/2 each) and p1 ≡ 1, neither barrier has been broken (in spite of much effort, particularly for the submatrix version, which is called the Planted Clique problem.) Here, we break both these barriers for the general model with Gaussian entries. We give a polynomial-time algorithm to identify all the hidden hubs with high probability for k ≥ n for some δ > 0, when σ 1 > 2σ 2 0 . The algorithm extends easily to the setting where planted entries might have different variances each at least as large as σ 1 . We also show a nearly matching lower bound: for σ 1 ≤ 2σ 0 , there is no polynomial-time Statistical Query algorithm for distinguishing between a matrix whose entries are all from N(0, σ 0) and a matrix with k = n 0.5−δ hidden hubs for any δ > 0. The lower bound as well as the algorithm are related to whether the chi-squared distance of the two distributions diverges. At the critical value σ 1 = 2σ 2 0 , we show that the general hidden hubs problem can be solved for k ≥ c√n(lnn)1/4, improving on the naive row sum-based method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Hidden Hubs Problem

We introduce the following hidden hubs model H(n, k, σ0, σ1): the input is an n × n random matrix A with a subset S of k special rows (hubs); entries in rows outside S are generated from the Gaussian distribution p0 = N(0, σ 0), while for each row in S, an unknown subset of k of its entries are generated from p1 = N(0, σ 1), σ1 > σ0, and the rest of the entries from p0. The special rows with hi...

متن کامل

Tuning Windowed Chi-Squared Detectors for Sensor Attacks

A model-based windowed chi-squared procedure is proposed for identifying falsified sensor measurements. We employ the widely-used static chi-squared and the dynamic cumulative sum (CUSUM) fault/attack detection procedures as benchmarks to compare the performance of the windowed chisquared detector. In particular, we characterize the state degradation that a class of attacks can induce to the sy...

متن کامل

CHI-SQUARED DISTANCE AND METAMORPHIC VIRUS DETECTION A Thesis

CHI-SQUARED DISTANCE AND METAMORPHIC VIRUS DETECTION by Annie H. Toderici Malware are programs that are designed with a malicious intent. Metamorphic malware change their internal structure each generation while still maintaining their original behavior. As metamorphic malware become more sophisticated, it is important to develop efficient and accurate detection techniques. Current commercial a...

متن کامل

The Limiting Distribution of a Test for Multivariate Structure

We define a chi-squared statistic for p-dimensional data as follows. First, we transform the data to remove the correlations between the p variables. Then we discretize each variable into groups of equal size and compute the cell counts in the resulting p-way contingency table. Our statistic is just the usual chi-squared statistic for testing independence in a contingency table. Because the cel...

متن کامل

A Comparison of Stealthy Sensor Attacks on Control Systems

As more attention is paid to security in the context of control systems and as attacks occur to real control systems throughout the world, it has become clear that some of the most nefarious attacks are those that evade detection. The term stealthy has come to encompass a variety of techniques that attackers can employ to avoid detection. Here we show how the states of the system (in particular...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016